linear kernel
- Asia > Singapore (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Information Technology > Hardware (0.93)
- Information Technology > Data Science > Data Mining (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
GEQ: Gaussian Kernel Inspired Equilibrium Models
Despite the connection established by optimization-induced deep equilibrium models (OptEqs) between their output and the underlying hidden optimization problems, the performance of it along with its related works is still not good enough especially when compared to deep networks. One key factor responsible for this performance limitation is the use of linear kernels to extract features in these models. To address this issue, we propose a novel approach by replacing its linear kernel with a new function that can readily capture nonlinear feature dependencies in the input data. Drawing inspiration from classical machine learning algorithms, we introduce Gaussian kernels as the alternative function and then propose our new equilibrium model, which we refer to as GEQ. By leveraging Gaussian kernels, GEQ can effectively extract the nonlinear information embedded within the input features, surpassing the performance of the original OptEqs. Moreover, GEQ can be perceived as a weight-tied neural network with infinite width and depth. GEQ also enjoys better theoretical properties and improved overall performance. Additionally, our GEQ exhibits enhanced stability when confronted with various samples. We further substantiate the effectiveness and stability of GEQ through a series of comprehensive experiments.
We Still Don't Understand High-Dimensional Bayesian Optimization
Doumont, Colin, Fan, Donney, Maus, Natalie, Gardner, Jacob R., Moss, Henry, Pleiss, Geoff
High-dimensional spaces have challenged Bayesian optimization (BO). Existing methods aim to overcome this so-called curse of dimensionality by carefully encoding structural assumptions, from locality to sparsity to smoothness, into the optimization procedure. Surprisingly, we demonstrate that these approaches are outperformed by arguably the simplest method imaginable: Bayesian linear regression. After applying a geometric transformation to avoid boundary-seeking behavior, Gaussian processes with linear kernels match state-of-the-art performance on tasks with 60- to 6,000-dimensional search spaces. Linear models offer numerous advantages over their non-parametric counterparts: they afford closed-form sampling and their computation scales linearly with data, a fact we exploit on molecular optimization tasks with > 20,000 observations. Coupled with empirical analyses, our results suggest the need to depart from past intuitions about BO methods in high-dimensional spaces.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > Canada > British Columbia (0.04)
- North America > United States > Pennsylvania > Lancaster County > Lancaster (0.04)
- North America > Canada > Ontario (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)
Kernel Identification Through Transformers
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models, as the chosen kernel determines both the inductive biases and prior support of functions under the GP prior. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. Drawing inspiration from recent progress in deep learning, we introduce a novel approach named KITT: Kernel Identification Through Transformers . KITT exploits a transformer-based architecture to generate kernel recommendations in under 0.1 seconds, which is several orders of magnitude faster than conventional kernel search algorithms. We train our model using synthetic data generated from priors over a vocabulary of known kernels. By exploiting the nature of the self-attention mechanism, KITT is able to process datasets with inputs of arbitrary dimension. We demonstrate that kernels chosen by KITT yield strong performance over a diverse collection of regression benchmarks.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
- Asia > Singapore (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Information Technology > Hardware (0.93)
- Information Technology > Data Science > Data Mining (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
3210ddbeaa16948a702b6049b8d9a202-Reviews.html
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper studies the collision probability of the following hashing scheme for points in R^D: the hash function picks a random vector of D i.i.d. The same scheme for 2-stable distribution has been studied before (known as sim-hash). The main result shows that the collision probability for alpha=1 on binary data can be approximated by a function of the chi-square similarity. The bound for general data is pointed out by the authors themselves to be far from the true collision probability so it is not clear what it means, especially regarding the comparison between that bound and the chi square similarity. The paper provides some experiments to show that the collision probability is approximately equal to some functions of the chi square similarity, and thus one can approximate chi square similarity from the collision probability and these functions. However, in this process, I think we are no longer able to use linear SVM or efficient near neighbor search (advantages 2 and 3 in the introduction) and have to use kernel SVM and exhaustive search instead. When using linear SVM, we use the kernel implicitly defined by the LSH and there is inadequate explanation of how useful it is (chi square is useful but it does not mean any function of it is also useful).
- Asia > Middle East > Lebanon (0.04)
- North America > United States > Nevada (0.04)
- Research Report (0.66)
- Summary/Review (0.49)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)